30 research outputs found
Rethinking Pointer Reasoning in Symbolic Execution
Symbolic execution is a popular program analysis technique that allows seeking for bugs by reasoning over multiple alternative execution states at once. As the number of states to explore may grow exponentially, a symbolic executor may quickly run out of space. For instance, a memory access to a symbolic address may potentially reference the entire address space, leading to a combinatorial explosion of the possible resulting execution states. To cope with this issue, state-of-the-art executors concretize symbolic addresses that span memory intervals larger than some threshold. Unfortunately, this could result in missing interesting execution states, e.g., where a bug arises. In this paper we introduce MemSight, a new approach to symbolic memory that reduces the need for concretization, hence offering the opportunity for broader state explorations and more precise pointer reasoning. Rather than mapping address instances to data as previous tools do, our technique maps symbolic address expressions to data, maintaining the possible alternative states resulting from the memory referenced by a symbolic address in a compact, implicit form. A preliminary experimental investigation on prominent benchmarks from the DARPA Cyber Grand Challenge shows that MemSight enables the exploration of states unreachable by previous techniques
A Survey of Symbolic Execution Techniques
Many security and software testing applications require checking whether
certain properties of a program hold for any possible usage scenario. For
instance, a tool for identifying software vulnerabilities may need to rule out
the existence of any backdoor to bypass a program's authentication. One
approach would be to test the program using different, possibly random inputs.
As the backdoor may only be hit for very specific program workloads, automated
exploration of the space of possible inputs is of the essence. Symbolic
execution provides an elegant solution to the problem, by systematically
exploring many possible execution paths at the same time without necessarily
requiring concrete inputs. Rather than taking on fully specified input values,
the technique abstractly represents them as symbols, resorting to constraint
solvers to construct actual instances that would cause property violations.
Symbolic execution has been incubated in dozens of tools developed over the
last four decades, leading to major practical breakthroughs in a number of
prominent software reliability applications. The goal of this survey is to
provide an overview of the main ideas, challenges, and solutions developed in
the area, distilling them for a broad audience.
The present survey has been accepted for publication at ACM Computing
Surveys. If you are considering citing this survey, we would appreciate if you
could use the following BibTeX entry: http://goo.gl/Hf5FvcComment: This is the authors pre-print copy. If you are considering citing
this survey, we would appreciate if you could use the following BibTeX entry:
http://goo.gl/Hf5Fv
SymFusion: Hybrid Instrumentation for Concolic Execution
Concolic execution is a dynamic twist of symbolic execution de-
signed with scalability in mind. Recent concolic executors heavily
rely on program instrumentation to achieve such scalability. The
instrumentation code can be added at compilation time (e.g., using
an LLVM pass), or directly at execution time with the help of a
dynamic binary translator. The former approach results in more ef-
ficient code but requires recompilation. Unfortunately, recompiling
the entire code of a program is not always feasible or practical (e.g.,
in presence of third-party components). On the contrary, the latter
approach does not require recompilation but incurs significantly
higher execution time overhead.
In this paper, we investigate a hybrid instrumentation approach
for concolic execution, called SymFusion. In particular, this hybrid
instrumentation approach allows the user to recompile the core
components of an application, thus minimizing the analysis over-
head on them, while still being able to dynamically instrument the
rest of the application components at execution time. Our experi-
mental evaluation shows that our design can achieve a nice balance
between efficiency and efficacy on several real-world application
an interactive visualization framework for performance analysis
Input-sensitive profiling is a recent methodology for analyzing how the performance of a routine scales as a function of the workload size. As increasingly more detailed profiles are collected by an input-sensitive profiler, the information conveyed to a user can quickly become overwhelming. In this paper, we present an interactive graphical tool called aprof-plot for visualizing performance profiles. Exploiting curve fitting techniques, aprof-plot can estimate the asymptotic complexity of each routine, pointing the attention of the programmer to the most critical routines of an application. A variety of routine-based charts can be automatically generated by our tool, allowing the developer to analyze the performance scalability of a routine. Several examples based on real-world applications are discussed, showing how to conduct an effective performance investigation using aprof-plot
Fuzzing Symbolic Expressions
Recent years have witnessed a wide array of results in software testing,
exploring different approaches and methodologies ranging from fuzzers to
symbolic engines, with a full spectrum of instances in between such as concolic
execution and hybrid fuzzing. A key ingredient of many of these tools is
Satisfiability Modulo Theories (SMT) solvers, which are used to reason over
symbolic expressions collected during the analysis. In this paper, we
investigate whether techniques borrowed from the fuzzing domain can be applied
to check whether symbolic formulas are satisfiable in the context of concolic
and hybrid fuzzing engines, providing a viable alternative to classic SMT
solving techniques. We devise a new approximate solver, FUZZY-SAT, and show
that it is both competitive with and complementary to state-of-the-art solvers
such as Z3 with respect to handling queries generated by hybrid fuzzers
Hiding in the Particles: When Return-Oriented Programming Meets Program Obfuscation
Largely known for attack scenarios, code reuse techniques at a closer look
reveal properties that are appealing also for program obfuscation. We explore
the popular return-oriented programming paradigm under this light, transforming
program functions into ROP chains that coexist seamlessly with the surrounding
software stack. We show how to build chains that can withstand popular static
and dynamic deobfuscation approaches, evaluating the robustness and overheads
of the design over common programs. The results suggest a significant amount of
computational resources would be required to carry a deobfuscation attack for
secret finding and code coverage goals.Comment: Published in the proceedings of DSN'21 (51st IEEE/IFIP Int. Conf. on
Dependable Systems and Networks). Code and BibTeX entry available at
https://github.com/pietroborrello/raindro
WEIZZ: Automatic Grey-box Fuzzing for Structured Binary Formats
Fuzzing technologies have evolved at a fast pace in recent years, revealing
bugs in programs with ever increasing depth and speed. Applications working
with complex formats are however more difficult to take on, as inputs need to
meet certain format-specific characteristics to get through the initial parsing
stage and reach deeper behaviors of the program. Unlike prior proposals based
on manually written format specifications, in this paper we present a technique
to automatically generate and mutate inputs for unknown chunk-based binary
formats. We propose a technique to identify dependencies between input bytes
and comparison instructions, and later use them to assign tags that
characterize the processing logic of the program. Tags become the building
block for structure-aware mutations involving chunks and fields of the input.
We show that our techniques performs comparably to structure-aware fuzzing
proposals that require human assistance. Our prototype implementation WEIZZ
revealed 16 unknown bugs in widely used programs
On the Dissection of Evasive Malware
Complex malware samples feature measures to impede automatic and manual analyses, making their investigation cumbersome. While automatic characterization of malware benefits from recently proposed designs for passive monitoring, the subsequent dissection process still sees human analysts struggling with adversarial behaviors, many of which also closely resemble those studied for automatic systems. This gap affects the day-to-day analysis of complex samples and researchers have not yet attempted to bridge it. We make a first step down this road by proposing a design that can reconcile transparency requirements with manipulation capabilities required for dissection. Our open-source prototype BluePill (i) offers a customizable execution environment that remains stealthy when analysts intervene to alter instructions and data or run third-party tools, (ii) is extensible to counteract newly encountered anti-analysis measures using insights from the dissection, and (iii) can accommodate program analyses that aid analysts, as we explore for taint analysis. On a set of highly evasive samples BluePill resulted as stealthy as commercial sandboxes while offering new intervention and customization capabilities for dissection
Automatic Performance Testing using Input-Sensitive Profiling
During performance testing, software engineers commonly perform application profiling to analyze an application\u27s traces with different inputs to understand performance behaviors, such as time and space consumption. However, a non-trivial application commonly has a large number of inputs, and it is mostly manual to identify the specific inputs leading to performance bottlenecks. Thus, it is challenge is to automate profiling and find these specific inputs. To solve these problems, we propose novel approaches, FOREPOST, GA-Prof and PerfImpact, which automatically profile applications for finding the specific combinations of inputs triggering performance bottlenecks, and further analyze the corresponding traces to identify problematic methods. Specially, our approaches work in two different types of real-world scenarios of performance testing: i) a single-version scenario, in which performance bottlenecks are detected in a single software release, and ii) a two-version scenario, in which code changes responsible for performance regressions are detected by considering two consecutive software releases